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We consider discrete denoising of two-dimensional data with characteristics that may be varying abruptly 
between regions. Using a quadtree decomposition technique and space- filling curves, we extend the recently 
H-j developed S-DUDE (Shifting Discrete Universal DEnoiser), which was tailored to one-dimensional data, to the 

two-dimensional case. Our scheme competes with a genie that has access, in addition to the noisy data, also to the 
underlying noiseless data, and can employ m different two-dimensional sliding window denoisers along m distinct 
regions obtained by a quadtree decomposition with m leaves, in a way that minimizes the overall loss. We show 
that, regardless of what the underlying noiseless data may be, the two-dimensional S-DUDE performs essentially 
as well as this genie, provided that the number of distinct regions satisfies m = o(n), where n is the total size 
of the data. The resulting algorithm complexity is still linear in both n and ra, as in the one-dimensional case. 
Our experimental results show that the two-dimensional S-DUDE can be effective when the characteristics of the 
underlying clean image vary across different regions in the data. 
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Index Terms- discrete denoising, two-dimensional data, quadtree decomposition, space-filling curves, Peano- 
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Discrete denoising is the problem of reconstructing the components of a finite-alphabet sequence based on the 
observation of its Discrete Memoryless Channel (DMClHcorrupted version. Universal discrete denoising, in which 
no statistical or other properties are known a priori about the underlying clean data and the goal is to attain optimum 
performance, was considered and solved in pQ. The main result in pQ is the semi- stochastic setting one, which asserts 
that, regardless of what the underlying individual sequence may be, the Discrete Universal Denoiser (DUDE) attains 
the performance of the best sliding window denoiser that would be chosen by a genie who accesses, in addition to the 
noisy sequence, the underlying clean data. Recently, in [2 , a generalization has been carried out for the case in which 
the characteristics of the underlying sequence change over time. The new scheme, called Shifting Discrete Universal 
Denoiser (S-DUDE), was shown to achieve the performance of the best combination of sliding window denoisers, 
allowing at most ra shifts (i.e., switches from one sliding window denoiser to another) along the sequence, provided 
that ra grows sub-linearly in the data size n, regardless of what the underlying noiseless sequence may be. It was 
also shown in [2 that the scheme can be implemented efficiently via dynamic programming, with linear complexity 
both in n and ra. 

One of the domains in which DUDE found its application is image denoising. It was shown in the experimental 
results of pQ and [3] that DUDE achieves or often outperforms the best of several of the state-of-the-art image 
denoisers for small-alphabet images, many of which are sliding window schemes. It is natural then to attempt to 
extend S-DUDE for images, namely, two-dimensional data, as well. The motivation is clear; images tend to have 
locally distinct characteristics, and allowing the sliding window denoisers to shift from one region to another may 
significantly improve the denoising performance compared to applying one fixed sliding window denoiser throughout 



^-The DMC is assumed known throughout this paper. This assumption is benign in applications where the DMC is easily learnable 
from the data. 



all the data. However, whereas the extension of the DUDE to two-dimensional data was straightforward (cf. [TJ 
Section VIII-C] and [3]), that of the S-DUDE is highly non-trivial, since it requires segmentation of the data, based 
on its noisy observation, into homogeneous regions in a way that minimizes the overall loss. Such segmentation is 
significantly more involved and often intractable, in contrast to the one-dimensional case of the S-DUDE, which only 
required to divide the data into distinct intervals with associated denoisers. 

Due to this difficulty of general segmentation of data, we instead adopt a restricted, yet rich enough, segmentation 
scheme - quadtree decomposition - to build a reference class of shifting two-dimensional sliding window denoisers. 
Then, we employ the space-filling Peano-Hilbert curve [4j [5] to scan the data so that applying the original one- 
dimensional S-DUDE on the scanned data can achieve the best performance among the schemes in the reference 
class, regardless of the underlying clean data. The quadtree decomposition has been popular in image compression 
[6] [7] and pattern recognition [8] , and recently in [9] , it has also been applied to denoising continuous- valued signals by 
viewing denoising as a low-rate lossy compression problem. The Peano-Hilbert curves have been used, among other 
applications, in universal compression of two-dimensional data in both the individual sequence setting [4] and the 
probabilistic setting [10]. A more general problem of scanning and predicting multi-dimensional data was considered 
in pTJ [12] . The combination of the quadtree decomposition and the Peano-Hilbert scanning for discrete denoising 
problems is the main contribution of this paper. 

Our resulting denoising scheme, 2-D S-DUDE, still enjoys the performance guarantees that parallel those of [2] 
for two-dimensional data. That is, regardless of what the underlying clean data might be, 2-D S-DUDE performs 
asymptotically as well as the best combination of the two-dimensional sliding window denoisers that can shift across 
at most m distinct regions, as can be segmented by the best quadtree decomposition. Our use of the Peano-Hilbert 
scan is essential to obtain a scheme of which complexity remains linear in both the data size and the number of 
distinct segments m, whereas an effort of directly finding the best quadtree decomposition may have resulted in a 
scheme with complexity exponential in m. We show the effectiveness of our scheme by experimental results that 
demonstrate 2-D S-DUDE outperforming 2-D DUDE, particularly for images of space-varying characteristics, such 
as scanned magazines, etc. 

The rest of the paper is organized as follows: Section [2] collects necessary notation, preliminary results and detailed 
motivation for this work. Our main theoretical results and algorithm are given in Section [3J and the experimental 
results are presented in Section [4] Concluding remarks with a discussion of future work are given in Section [5] 

2 Notation, Preliminaries, and Motivation 

2.1 Notation 

We follow the notation of [2 . Let X,Z,X denote, respectively, the alphabet of the clean, noisy, and reconstructed 
sources, which are assumed to be finite. As in [T[|2j[T3], the noisy sequence is a DMC-corrupted version of the clean 
one, where the channel matrix is denoted by II = {II(x, z)} xe x,zeZi an d n(x, z) stands for the probability of a noisy 
symbol z when the underlying clean symbol is x. Throughout the paper, II is assumed to be known and fixed, and of 
full row rank. When a reconstruction x is made for a clean symbol x, the goodness of the reconstruction is measured 
by a loss function A : X x X — >> [0, oo). Upper case letters denote random variables; lower case letters denote 
either individual deterministic quantities or specific realizations of random variables. Without loss of generality, 
the elements of any finite alphabet V will be identified with {0, 1, • • • , |V| — 1}. For V-valued sequence, we let 
v n = (v\, ■ ■ ■ ,v n ), v 7 ^ = (v m , ■ ■ ■ ,v n ), and v 71 ^ 1 = v t ~ 1 v^ +1 . R v is a space of |V|-dimensional column vectors with 
real- valued components indexed by the elements of V. 

Now, consider the set S = {s : Z ^ X}, which is the (finite) set of mappings that take Z into X. We refer to 
elements of S as "single-symbol denoisers" , since each s e S can be thought of as a rule for estimating x G X on the 
basis of z G Z. Then, for any s G <S, we can always devise an estimated loss £(Z,s) with the knowledge of II, which 
is an unbiased estimate of the true expected loss E x A(x, s(Z)), i.e. satisfying 

E X H{Z, s) = E x A(x : s(Z)) \/x G X. (1) 

The expectation in ([!]) is with respect to the conditional distribution on Z given x, II (x, •). For more details on the 
motivation for using a loss function £ satisfying (JTl), and on its explicit form, readers may refer to [2, Section II- A]. 



2.2 DUDE and S-DUDE for 1-D data 

Here, we review and summarize the results from pQ and [2], and collect the ideas that will be needed for this paper. 
For one dimensional data, an n-block denoiser is a collection of n mappings X n = {X t }i<t<n, where X t ' % n — >■ X- 
The performance of the denoiser X n on the individual sequence pair (x n , z n ) is measured by its normalized cumulative 
loss 

L ±n {x n ,z n ) = -y2k{x t ,X t {z n )). 

t=i 

As argued in [2, Section II-B], the n-block denoiser X n = {X t }i< t <n can be identified with F n = {i^}i<t<n? where 
F t : Z n ^ — > 5 is defined as follows: F t (z n ^ t ^ •) is the single-symbol denoiser in S satisfying 

X t (z n )=F t (z"\\zt), V**. (2) 

One special class of widely used n-block denoisers is that of the fc-th order "sliding window" denoisers, which we 
denote by X n,<Sfc . Such denoisers are of the form 

X^(z n ) = s k (z t + k k ) = s k (c u z t ) (3) 

for t = k + 1, • • • , n — fc, where s*. is an element of Sk = {sk ' Z 2k+l — )• <Y}, the (finite) set of mappings from ^ 2/c+1 
into X , and c t = (^-fc'^t+i) i s the (two-sided) fc-th order context for z t . We refer to s^ G 5^ as a "fc-th order 
denoiser". Note that So equals to S in the previous section. C& = {(uZ],^ u i) : ( u -ki u i) ^ ^ 2/c } is the set of all 
possible k-th order contexts, and for given z n and each c G C&, 

T(c) = {r : c r = c, fc + 1 < r < n — fc} 

is further defined to be the set of indices where the context of Z{ equals c. From the association in pi and the 
definition (|3|, we can deduce that for each c G C&, the fc-th order sliding window denoiser employs a time- invariant 
single-symbol denoiser, Sfc(c, •), at all points t G T(c). 
In pQ , the performance target of the denoising is 

1 n—k 

D k (x n ,z n ) = min — V A(x t , s fe (c t , 2*)), 

s k es k n — Ik ^-^ 
t=k+i 

the minimum normalized loss on (x n ,z n ) that can be attained by a k-th order sliding window denoiser. We can 
easily verify that for each c G C&, the best fc-th order sliding window denoiser that achieves Dk(x n , z n ) will employ 
the single-symbol denoiser 

argmin > A(x T ,s(z T )), (4) 

ses *■ — ' 

rGT(c) 

at all points t G T(c), which is determined from the joint empirical distribution of pairs {(x r ,z r ) : r G T(c)}. It 
is shown in [TJ Theorem 1] that, despite the lack of knowledge of x n , Dk(x n ,Z n ) is essentially achievable by the 
Discrete Universal DEnoiser (DUDE), which accesses only Z n and is implementable with linear complexity in n. For 
each c G C&, the DUDE can be shown to employ 

argmin V^ £(z T ,s) (5) 

rGT(c) 

at all points t G T(c), where l(z, s) is the loss function chosen to satisfy (HI). By comparing Q with ([5]), we observe 
that, for each context, the DUDE merely works with the estimated loss £(z r , s) in lieu of the true loss A(x r , s(z T )). 
The idea of working with estimated loss to achieve the genie-aided performance has been adopted again in [2] to 
refine the result of [1 . The main motivation of [2] was the observation that when the characteristics of the underlying 
sequence x n change with time, allowing the fc-th order denoiser to change from one interval of the data to another 
can further reduce the overall loss significantly. Therefore, whereas the DUDE competed with the best fixed fc-th 



order denoiser, [2] competes with the best among <S£ m , a set of "combinations" of fc-th order denoisers {sk,t}t=k+i 
that allow at most m shifts within t G 7~(c) for each c G C^. Thus, [2 sets a more ambitious performance target 

1 n—k 

D k , m {x n ,z n ) = min — V" k(x u s k , t (c u z t )), (6) 

se«S£ m n - 2k ^ 

the minimum normalized loss on (x n , z n ) that can be achieved by the sequence of k-th order denoisers that allow at 
most m shifts (changes) within each context. It is clear that Dj £ ^ rn (x n ^ z n ) < D k (x n ,z n ) for all (x n ,z n ). The new 
algorithm devised in [2] was called the (fc, m)- Shifting Discrete Universal DEnoiser (S-DUDE), XjV^ m , and was able 
to asymptotically achieve Dk, m (x n , Z n ) on the basis of Z n only, provided that m grows sub-linearly in n. The key 
trick was again to work with the estimated loss £(z, s) to obtain 



-j n—k 

S fe , m = arg min — V £(z t ,s ky t(c t ,-)), (7) 

and employ it throughout the sequence. The following theorem shows that by utilizing the estimated loss, we can 
successfully learn the best (at most m) shifts of k-th order denoisers throughout z n to minimize the overall normalized 
loss. 

Theorem 1 fl^ Theorem 4]) Suppose k = k n and m = m n grow with n sufficiently slowly to satisfy conditions 
detailed in [2, Claim 1}. Then, for all x E X°° , the sequence of denoisers {X^^ m } satisfies 



a) lirn^oo L ±n , k , m (x n ,Z n )-D K m(x n ,Z n ) 



a.s. 



b) For any 5 > 0, E[L ± n^m{x n , Z n ) - D k ,m(x n ,Z n )} = 0(y/k n \Z\ 2k n^y-sy 

Besides the performance guarantees, another key component of [2] was developing an algorithm that can implement 
(ITl efficiently, i.e., with complexity linear in both n and m. The details can be found in |2j Section V-A]. 

2.3 Motivation 

As pointed out in [2, Section V-B], both DUDE and S-DUDE run the same algorithm in parallel along each subse- 
quence associated with each context, and this characteristic enables us to extend both schemes to the two-dimensional 
(2-D) data case: use 2-D contexts and again run the algorithms on each subsequence. However, as noted in the Intro- 
duction, the extension to 2-D data of the S-DUDE is not as straightforward as that of the DUDE. The main reason 
is that, whereas the output of DUDE is independent of the ordering of data within each context and only requires 
the empirical distribution of the data, the ordering of said data is very consequential for S-DUDE's output and its 
performance. 

The ordering of data is naturally given and fixed in one-dimensional (1-D) data. Therefore, S-DUDE only had 
to find shifting points based on noisy data so that applying different sliding window denoisers in different intervals 
will minimize the overall loss. Figure [l] shows one such segmentation in which different colors represent intervals 
where different sliding window denoisers are applied. In contrast, in the 2-D case, it is not clear how the 2-D version 



Figure 1: Segmentation of 1-D data 

of S-DUDE should segment into homogeneous 2-D regions, instead of intervals, in order to allow shifting of sliding 
window denoisers across the data. As depcited in Figure [2j the optimal segmentation that leads to the minimum 
loss can be arbitrary, and hence, trying to learn the best segmentation solely based on noisy data would be overly 
ambitious. One naive approach to avoid such a 2-D segmentation issue would be to first raster scan the image, then 




Figure 2: Segmentation of 2-D data 

apply the ordinary S-DUDE on the resulting 1-D data, which was the method used in [2j Section VI- A]. However, 
this could often result in poor performance of the scheme since it may require the S-DUDE to shift too frequently, 
i.e., m to become linear in the data size n, which violates the necessary condition specified in [2, Theorem 5] for the 
scheme to work. This point can be seen by imagining the situation of running the raster scan vertically on the image 
in [2j Figure 2], where even though the image consists of a small number of two-dimensional regions, when raster 
scanned into one-dimensional data the number of changes of the data characteristics grows significantly. 

To address this issue of segmenting and scanning of the 2-D data in general, in this paper we focus on a more 
regularized class of segmentation schemes, the quadtree decomposition, to build the reference class of shifting sliding 
window denoisers rich enough for the denoising task at hand. Then, in order to compete with the reference class, 
we utilize the space-filling Peano-Hilbert (PH) curve to scan subsequence points for each 2-D context and run the 
ordinary S-DUDE on the P-H scanned, 1-D data. In the next section, we describe our scheme formally, and prove a 
performance guarantee for it, which parallels that of the scheme for ID-data in [2]. 

3 S-DUDE for 2-D data 



Before presenting the 2-D extension of S-DUDE, we introduce additional notation in Section 3J_ through Section 3.3| 
Then, in Section [3^4} we derive our scheme and present theoretical guarantees of its performance. In Section [375 , we 
succinctly describe the algorithm and its complexity. 

3.1 2-D data and contexts 

We represent the 2-D data with the coordinate of each data point. For simplicity, we assume the 2-D data is always 
in the square forrrj^J Then, denote T/v = {t G Z 2 : t = (£1,^2), 1 < t\ < N, 1 < t 2 < N} as the set of coordinates of 
the given 2-D data. Also, let n = \Tn\ = N x N be the total size of the data. For t £ Tn, %t will denote the noisy 
symbol at location t = (ti,^), an d x NxN and z NxN will denote the total clean and noisy 2-D data, respectively. In 
addition, for t £ T/v, z NxN \ t will denote {zi : i £ Tn^ 7^ t}. With this notation, notions of the 2-D n-block denoiser 
X^, the normalized cumulative loss 

L^. n [x ,z ) = - } A{xt,X t2 n(z )), 

teT N 

and the association in pi) follow naturally in parallel to the 1-D data case. 

The 2-D k-th order sliding window denoisers can be understood similarly. First, consider the sequence of co- 
ordinates, 1 = (ii,i2 5 ^3 5 • • *)> m the 2-D lattice of integers, in which the coordinates are enumerated in the or- 
der of increasing distance to the origin as in Figure [3J Then, 2-D k-th. order context for z t is defined to be 
c t,2D = O^t-Hi?'*' i z t+i 2 k)i wnere (hr ' ' ^2/c) are the first 2k coordinates of X, and the additions of coordinates 
simply boil down to the translation of the coordinates. We also denote C^d as a se t of all possible 2-D k-th. order 
contexts. Then, 2-D k-th order sliding window denoiser at location t is again of the form 

X^ 2D (z n ) = s k (c t ^ B ,z t ), 

with c t ,2D £ Qfe,2D, parallelling ([3|. 

2 For other cases, we can simply fill in remaining regions with dummy symbols. 
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Figure 3: The ordering coordinates of the 2-D integer lattice plane 



3.2 The quadtree (QT) decomposition 

A quadtree (QT) is a tree of which every node is either a leaf or a parent node with four children. This QT structure 
can be used to segment the 2-D data as follows. Each node of a QT at depth d represents a quadrant of size ^ x ^ 
(assuming N = 2 r and d < r), and a child node represents one of the four quadrants inside the parent node's 
quadrant. The four children of a parent node associate with the four quadrants of the parent node's quadrant in the 
order of (upper- left quadrant) ^(upper-right quadrant) ^(lower-right quadrant) ^(lower-left quadrant). Obviously, 
the root node is associated with the whole 2-D data. The leaves of a QT represent the final segmentation of the 2-D 
data for given QT. Thus, if a QT has m leaves, the resulting segmentation has m distinct regions. For example, in 
Figure 4(a)| the two dimensional data is segmented into 13 different regions, and the corresponding QT in Figure 
4(b) has m = 13 leaves. We also denote Q m as a set of all QTs that have m leaves, and for given q G Q m , 




(a) 2-D data decomposition (b) QT decomposition 

Figure 4: The decomposition of a 2-D data and its corresponding QT (m = 13) 



define R q : T/v 



{1, • • • , m} as a mapping that maps each coordinate of the 2-D data to the one of m leaves in q 
corresponding to the region containing the coordinate. Although the QT decomposition is limited in the sense that 
it only decomposes the data into quadrants, it still is shown to be effective in many applications since it gives a 
compact representation of segmentation and is rich enough for capturing local similarity of data. 



3.3 Peano-Hilbert (PH) Curve 

The Peano-Hilbert (PH) curves are well known as space-filling curves. They possess the property that, for any level 
of quadrants, the curves never leave a quadrant before visiting all the sites within the quadrant. The details of PH 
curves and scanning orders can be found in [H Section II] . Examples of PH curves for 2-D data with N = 2 4 and 
N = 2 8 are shown in Figure [5] 

The PH curve naturally defines an ordering of 2-D according to the order in which the PH curve fills the plane. 
Then, for noisy 2-D data z NxN , we denote zg H as its PH scanned noisy 1-D sequence and £g H as the corresponding 
clean 1-D sequence. In addition, we denote the ith index according to the PH scan by PH^. Note that 



PH, G % 



Ni 



1... 



Thus, for example, zp^ stands for the zth component of the n-tuple zg H , and cph^d G C^d is the 2-D fc-th order 
context at that location. 







(a) PH(4) curve (16 x 16) (b) PH(8) curve (256 x 256) 

Figure 5: Peano-Hilbert curves 

3.4 Derivation of the scheme and performance guarantees 

Equipped with the definitions and notation in the previous subsections, we now follow the line of argument paralleling 
that in [2 : begin with the case of symbol- by- symbol denoisers (fc = 0) to build the main idea, then move on to the 
general fc-th order case. For simplicity, we assume N = 2 r , and n = N x N = 2 2r . 



3.4.1 Competing with Combinations of Single-Symbol Denoisers (fc = 0) 

Consider a 2-D n tuple of single-symbol denoisers S = {s t : t G T/v} G Sq. For such S, we can associate the 2-D 



n-block denoiser X^ as 



X+2d( z 



NxN\ 



s t (zt) 



(8) 



for all t G Tn- In order to construct the reference class based on QT decomposition described in Section [2T3| we 
define a subset S m (q) C Sq associated with a given QT q G Q m as 

S m (q) 4 {S e <S " : Si = 8j if R q (i) = R g (j), for all i,j € T N }. 

In words, S m (q) is a set of 2-D n-tuples of single-symbol denoisers, of which denoising rules are constant within each 
of the m distinct regions defined by a QT q G Qm- Now, for a fixed n and m, define a set S£ Q m C Sq as 



So, Qm = U S «^)' 



(9) 



qeC 



which is a set of all possible configurations of single-symbol denoisers confined to be constant in regions defined by 
QTs with m leaves. Following is a simple lemma presenting a lower bound on the size of the set Sqq in terms of 
the number of segmented regions m. 

Lemma 1 The set «Sq g m defined in (R|) satisfies 

l^ Q J= fi ( 3f )- 

Proof: By the definition of the QT, we observe that the number of leaves always has the form of m — 3j + l, where j G 
N U {0}. Then, for given m, we can see that 

i-i 

i=0 

which is from the fact that a new segmentation of a quadrant can happen in any of the leaves of the original QT. 
Therefore, 

lim ^£ft > > 



and we have the lemma. ■ 

Given the reference class of switching single-symbol denoisers (J9J) , we define the performance target for given 2-D 
data (x* x *****) as 

D™{x N * N , z N * N ) = min I ]T A(x t , s t (z t )), (10) 

oGo q 71 

i.e., the best denoising performance attainable among all combinations of single symbol denoisers in Sqq- In order 
to find the combination of single-symbol denoisers that asymptotically achieve (10) based only on Z NxN , we may 
again use the idea of utilizing the estimated loss (fil) in place of the true loss as in [2 to find 

ar s s ™ n ^E^> s *)- ( n ) 

Q t 



However, the naive brute-force algorithm to find the achiever of (11) requires the exhaustive search over the set 
5qq , which results in exponential complexity in m as specified in Lemma [T| In contrast to the 1-D case, an 
efficient algorithm that directly finds the best combination of single-symbol denoisers in Sq Q m does not appear to 
exist. To circumvent this issue, the PH scanning comes into play and serves as a key component for devising an 



efficient algorithm to attain performance essentially at least as good as (10). To this end, we define another set of 
combinations of single-symbol desnoisers 

sZ {n) ±{S€SS:JT l {spHi _ 1#SPHi} < m}. (12) 

i=l 

In words, S m w is a set of combinations of single-symbole denoisers that have at most m switches when the denoisers 
are ordered according to the PH scanning order. Equation (12) is identical to [2, eq. (20)] except that it is for the 
PH scanned 1-D data of the original 2-D data. We can now define 

1 n 
S = arg min - VV(Z PHU , sphJ, (13) 

which can be found with linear complexity both in n and m by the two-pass dynamic programming algorithm 

rn,0,m j j n 'j j -i ~\7-n,S 

L 2D univ' anc ^ define it to be X 2 ^ 



established in [2j Section V]. We denote our 2-D (0, m)-S-DUDE as 'X-^b'unW^ anc ^ define it to be X^ (recalling the 



notation X^ from (|8j)). Before stating the performance guarantee of our scheme, we have following lemma. 



Lemma 2 Define a quantity 



Then, we have 



DZ(x NxN ,z NxN ) ± min n) \ J2 A(x t ,s t (Z t )). 



SG«s n _ ■- ter N 



DZ(x NxN ,Z N * N )<D™(x N * N ,z N * N ). 



for every (x NxN , z NxN ). 

Proof: From the definitions (|9| and (12), we can see that 



C n f- c PH ( n ) 

since any 2-D n-tuple single-symbol denoisers in Sq Q m would be also in <S m after reordering them in the PH 
scanning order. This is because PH scan never leaves a quadrant before visiting all the data points in a quadrant, 
and the shifting positions for sequences in S m W can appear anywhere in the data, whereas the shifting positions in 
Sqq always appear on the boundary of the quadtree decomposed quadrants. Therefore, as the objective functions 
are identical for 2^(zg H , zg H ) and Dl D m (x NxN , z NxN ),we get the lemma. ■ 

The following theorem gives the concentration result on the performance of X!^'™^. 



Theorem 2 For X"j5'™ ™ defined in @, and /or o«e>0 arwi/^e^^, we have 

Pr (L*„,o. m (x™, 2 WxW ) - Dg°(x N * N , Z NxN ) > e) 

\ 2D univ ' / 

/ r^ 2 „( 1 {m\ (ra-h l)ln|<Sh i\ ,„ , N 

- 2eXP (" n ^" 2 Wn) + n I]) (14) 

where h(x) = — xlnx — (1 — x) ln(l — x) /or < x < 1, £ m ax = max^^^^^^ A(x, z) + mdiX ze z,ses £(z,s), and 
\S\ = \X^ Z \ , the size of the set of single- symbol denoisers. 

Proof: From the union bound, we have 

Pr (Lx20, z ^x N * N ,Z N * N ) - D™ m (x NxN ,Z N * N ) > e) 

= Pr (L ±n ,o, m (x N * N , Z N * N ) - C(^ XW , Z N * N ) + D™ (x N * N , Z N * N ) - D™ m (x N * N , Z N * N ) > e) 

\ 2D univ / 

< Pr (L^, Zv (x IfxN , Z n * n ) - D™(x NxN , Z N * N ) > e) , (15) 

since 

Pr (D™(x NxN , Z NxN ) - D%%{x NxN , Z N * N ) > 0) = 

from Lemma [2j Therefore, the event in (15) becomes identical to the one in the 1-D problem, and the probability 



bound (fT4j) is obtained by the exactly same analysis given in [2j Theorem 2]. 

3.4.2 Competing with Combinations of fc-th order denoisers 

Establishing the result on the single-symbol denoiser case, we now can move on to the general case of competing 
with fc-th order denoisers. For general k > 0, let k = |~fc/4] and n^ = (N — fc) x (N — k). Define 

T Nk ={teZ 2 :t = (ti,t 2 ),fc + l <h <N -k,k + l < t 2 <N-k}, 

which is a subset of T/v with size |T/vJ = n^. For given z NxN , we define n/c-tuple of (fc-th order denoiser induced) 
single-symbol denoisers 

S k (z NxN ) ± {a k>t (ct,2T>, -)}ter Nk e SZ k , 



^NxN 



where Ct,2D is the 2-D k-th order context defined in Section 3.1 For brevity, we suppress the dependence on z 

in Sk(z NxN ) and denote it as S&. Similarly as in the case of k = 0, for given S&, we can associate the 2-D n-block 

denoiser X^b k as 



X?,2d(* ) = ^,t(ct,2D,^) teT Nk . 

As in the 1-D case [2], for each c E C/^d, we define T(c) = {r : c Tj 2d = c,t G T/v fc } as the set of indices of 
z NxN w j iere th e 2-D context equals c. Then, for fixed m, each quadtree q G Q m , and given z NxN , define a subset 
Sfc.mte) C5 nfc as 

S/c,m(<?) = |J {{<§fe,t(c, -)}ter(c) • Sfc,»(c, •) = Sfc,j(c, •) if i?g(z) = ^(j)}, 

cGCfc i2 D 

the set of n/c-tuples of (fc-th order denoiser induced) single- symbol denoisers that, within the sub-data points {t : t E 
T(c)} for each c G Cfe,2D, are identical on the regions decomposed by q G Q m . Then, define 

S£, Q Jz NxN )= U s *^) 

qeQ m 

as all combinations of (fc-th order denoiser induced) single-symbol denoisers that, within each subsequence associated 
with each c G C/^D, is confined to remain constant within each of the regions determined by QTs with m leaves. 
Again, for brevity, the dependence on z NxN in S^Q m (z NxN ) is suppressed, and we write <S% n rn - Note that the 



(jlOl) , for given (x NxN , z NxN ), we define the k-th order performance target as 



above notation simply generalizes that of Section 3.4.1| by parallelizing over each 2-D context. Finally, in analogy to 

9rformance target as 

— ^ A(x t ,S fc?t (c t ,2D,^))- 



D 2D 



(x 



NxN NxN 



) = 



seS£ Qm n/e 



(16) 



tGTTV, 



Here too, using the estimated loss to directly find the combination of k-th order 2-D sliding window denoisers in 

we consider the 



^k,Q 



3.4.1 



that achieves (16) may require prohibitive complexity in m. Therefore, as in Section 
2-D data in the order of PH scanning, this time independently for each subsequence defined by each 2-D context 
c £ C/e 5 2D- To that end, we define S k J^ n) that parallels Jl2| and [2, (28)] as 

C (n) = {Sfe : {skAc, -)}r e r(c) e S™%\ c » for all c € C fc , 2D }, 

where n(c) = |T(c)| and m(c) = min{m, n(c)}. Although it may look complex, (fTTJ) is simply a set of combination 
of k-th order sliding window denoisers that shift at most m times along the PH-scanned subsequence for each 2-D 
context c G C/^d- With this notation and definitions, we define 

1 



(17) 



Sfc.m = are 



/ J ^(^PHU,S/c,PH;(cpH;,2D, ')) 



(18) 



ap 
case [2] for each PH-scanned subsequence defined by each context c G C^^d? we can find (18) with complexity linear 



and the 2-D (fc, ra)-S-DUDE, X2_' D '™ niv , as X 2rj k,m . By applying the dynamic programming algorithm of the 1-D 



in both n and m. A subtle point to emphasize here is that although we apply the 1-D scheme on the PH scanned 
1-D data, the subsequences that we apply our algorithm on are defined by the 2-D k-th order contexts. In this way, 
our scheme still competes with all combinations of 2-D k-th order sliding window denoisers with high probability, as 
is established in the following result. 

Theorem 3 For all e > and x NxN G X NxN , 

„NxN ?NxN\ 



Pr(^ ..„, (,: ' 



D"L(x N * N ,Z N * N 



2-D i 

2 , 



k,m 



»«) 



< 2(& + l) 2 exp( 



n k 



L2(/c + l) 2 



■2\Z 



\2k 



H 



m\ (m + 1) In \S\ 
n k y n k 



}])• 



(19) 



Proof : Once we define 



D 



PH 



/ k,m\ X 

we can again easily see that 



NxN NxN 



)= min — \ A(Z PHi ,Sfe,PH i (cpH i ,2D,-)) J 

C,rC PH ( n ) rib Z ' 

hke *k,m * {i:PH,GTiv fc } 



Dh,m\ X 



NxN NxN 



< Dt u m {x 



„NxN NxN 
i z 



for all {x 



NxN y NxN 



) since S k m is a larger set than S k q . Hence, proving the theorem becomes showing 
Pr(L ±n , k , m (x NxN , Z NxN ) - D k , m (x^, Zg H ) > e) < Jl9), 

\ VV 2-D univ * / 



which can be derived by the identical argument as in [2j Theorem 3]. ■ 

The following result, which is a direct consequence of the above theorem, can be considered the analogue of 
Theorem [II a) to 2-D data. It shows that our algorithm is still universal, i.e., regardless of the underlying data, our 
algorithm asymptotically attains the optimum performance in the reference class. 

Theorem 4 Suppose k = k n and m = m n are such that (19) is summable in n, e.g., k = c\ logn with c\ < 2 \o \z\ 



and m = n a with a < 1. Then, for all x G X c 



the sequence of denoisers {X^'^V^} satisfies 



lim \L 



N— »-oo 






(x 



NxN ^NxN\ 



D 



2D 



(x NxN ,Z NxN )] 



a.s. 



Proof : The proof combines the summability condition, the Borel-Cantelli lemma, the bound (19), and simple use of 
the union bound, similarly as in the proof of Theorem fl] in [2 . ■ 
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3.5 Algorithm and complexity 

We have shown that by applying the 1-D S-DUDE algorithm in [2j Section V-A] separately on the PH scanned 
subsequences for each 2-D context c E C&, the resulting scheme can attain the performance of the best combination 
of the k-th order denoisers that shifts across the m separate quadtree decomposed regions. The pseudo-algorithm 
for our 2-D S-DUDE is given below: 

Algorithm 1 The two-dimensional (2-D) (/c, m)-Shifting DUDE 

Require: LM t € R^+^xl^l, IMf e R \s\ for t G 7^, T G Rl Cfc l,r € M |Cfc| ,^ G Rl Cfe l,L G R as in [2j Section V-A] 



Ensure: S fc)m = {4,t(ct,2D, ')}teT k in ([18]) and the denoised output {x t } t eT k 
for increasing order of PH scanned index {i : PH^ G Tk} do 

identify the 2-D context cph^,2D 

run the forward recursion of ID S-DUDE on PH scanned points {PHj : PHj G T(cpH i ,2D), j < i} 
end for 
for decreasing order of PH scanned index {i : PH^ G Tk} do 

identify the 2-D context Cph^,2D 

run the backward recursion of ID S-DUDE on PH scanned points {PHj : PHj G TfcpH^D), j > i} 

identify the best single-symbol denoiser s^(cph,2D, •) for location PH^ 

obtain the denoised symbol xph^ = St(cpH,2D, ^phJ 
end for 

Note that the PH scanning of the data and running the 1-D S-DUDE on those scanned points can be done 
simultaneously, not separately. Therefore, the time and memory complexities of our algorithm are exactly the same 
as those of 1-D S-DUDE : 0(nm). Hence, competing with the QT decomposed, shifting 2-D sliding window denoisers 
for 2-D data is no harder than competing with shifting sliding window denoisers for 1-D data. 

3.6 Remarks 

Before presenting the experimental results of our scheme, we have a few remarks regarding our algorithm and analyses 
in above subsections. 

1. The performance guarantee results on expected (rather than actual) loss, and on a stochastic setting where 
the noise-free image, rather than an individual data array, is a random field, can be derived similarly as in the 
settings of [1 and [2 , once equipped with the above semi-stochastic setting result in Theorem pi We omit the 
exercise for conciseness and to refrain from repetition. 

2. It may also be natural to conceive of a denoising algorithm that heuristically finds a QT decomposition by 
greedily merging child nodes as in [6], using the estimated loss. This scheme is practical, and may be competitive 
with the best shifting sliding window denoisers based on QT decomposition, but is difficult to analyze and obtain 
rigorous performance guarantees. On the other hand, as the results above guarantee, our scheme, achieves the 
best possible performance among all scheme in the same reference class, and is practically implement able. 

3. The two components of our scheme, namely, QT decomposition and PH scanning, have been developed inde- 
pendently in previous literature, but in the denoising setting, we see that the marriage of the two is natural 
since they play complementary roles; the former efficiently segments data points that have similar character- 
istics and the latter unfolds the 2-D data into 1-D data while preserving the local similarity attained by the 
former. 

4. Although our algorithm and analysis pertained exclusively to 2-D data case, it is not hard to extend our 
scheme to the multi-dimensional data case beyond 2-D case, since analogously defining QT decomposition and 
PH scanning for multi-dimensional data is straightforward. 

4 Experimental Results 

As shown above, our 2-D S-DUDE enjoys considerable performance guarantees and efficient implementation, however, 
it is not clear how effective it might be in practice to compete with the reference class of QT decomposition-based 
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shifting sliding window denoisers. Hence, we show the performance of our scheme on three different sample images 
and compare with baseline schemes to highlight when use of our algorithm could be advantageous. 



4.1 Synthetic test image 

First, we show the denoising results for a synthetic image that showcase the benefit of our 2-D S-DUDE scheme. Figure 



6(a) shows the clean image that was constructed by pasting four binary sub-images that have different characteristics, 
and Figure [6(b)] is the noisy counterpart corrupted by a binary symmetric channel (BSC) with crossover probability 
5 = 0.1. The total image size is 512 x 512 and each sub-image has size 256 x 256. We compare our scheme with 
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Figure 6: Synthetic test image and denoising results 



three different baselines; 

(i) 2-D DUDE : This simply generalizes the 1-D DUDE in [1 to the 2-D data case with using the 2-D contexts 
defined in Section |3.1| Note that this scheme is already superior to many state-of-the-art image denoising 
schemes, as reported in [TJ Section VIII-C] and [3 J. It has a single parameter /c, the context size for the 2-D 
context. 

(ii) 1-D S-DUDE after raster scanning the data: This scheme first does the simple horizontal raster scan of 
the image, then applies the 1-D S-DUDE developed in [2 on the resulting 1-D sequence. It is the scheme used 
in [2] Section V-A] for image denoising experiments. The scheme has two parameters - k for the context size 
for the 1-D context and m for the number of shifts. 

(hi) 1-D DUDE after raster scanning the data: This scheme is a baseline, which coincides with the scheme in 
(ii) when the number of shifts m is set to 0. In [2, Section V-A], (ii) was shown to be superior to this scheme 
for images with characteristics that are abruptly changing. 
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One may think that the only difference between our 2-D S-DUDE and scheme (ii) is that we use the PH scan in 
place of the raster scan, but there is also a subtle difference that we consider the subsequence points with respect to 
the 2-D contexts whereas (ii) simply considers the 1-D contexts from the raster-scanned 1-D sequence. 

Figure u\ shows the bit error rate (BER) results for the four schemes. For (ii) and 2-D S-DUDE, we only show 
the results for the best m value, which happened to be m = 4 for both schemes. First, we can see that the difference 
between (ii) and (iii) is small. This is well expected since we can easily notice that the characteristics of the raster- 
scanned 1-D sequence of Figure 6 ( a) | vary linearly with respect to the sequence length n, which violates the necessary 
condition on the number of shifts m for (ii) to work, i.e., m should be sublinear in n as specified in [2] Theorem 5]. 
Second, interestingly, there is also no big difference between (i), which uses the 2-D context, and (ii) and (iii), which 
use the 1-D context from the raster-scanned sequence. Probably this shows that for our image, the 1-D contexts from 
the raster-scanned sequence are enough for capturing the locality of images, but we do not know whether this would 
be a general phenomenon. Finally, we can observe that our 2-D S-DUDE with m = 4 clearly dominates all three 
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Figure 7: Bit error rate comparison for the synthetic image 



baseline schemes. Note that, by construction, it would be optimal to first decompose the image in Figure |6(b)| into 
four separate quadrants (quadtree with 4 leaves) and apply four independent denoisers in each region. We see that 
by considering the noisy pixels in the order of PH scanning, our scheme, which knows nothing about the underlying 
clean image, successfully learns the decomposition of the image and further reduces the BER for denoising compared 
to other baseline schemes. 

The resulting denoised images for scheme (ii) and our 2-D S-DUDE are shown in Figure |6(c)| and Figure |6(d)| 
respectively. In line with the BER plot in Figure [7| we visually see that 2-D S-DUDE is superior to the scheme (ii) 
not only in terms of the number of errors, but also in terms of detecting the boundaries of images and preserving 
the textures. Particularly, the texts in Figure |6(d)|are more readable and the boundary between the Einstein and 
Yahoo! images is more clearly captured in Figure 6(d)[ 



4.2 Scanned magazine image 

Although the result on the synthetic image is encouraging, one may suspect the image was constructed in favor 
of 2-D S-DUDE, since it was divided into different sub-images corresponding to QT quadrants. We thus test our 
algorithm on a real image. Figure 8(a) shows the clean binary image obtained from scanning a real magazine page. 
Unlike the synthetic image in Section |47[| this image represents the common and realistic characteristics of images 
that have different textures in different regions of images. The image size is 512 x 512, and again corrupted by BSC 
with S = 0.1 that led to the noisy image in Figure [8(b)] 

Here, we only compare our 2-D S-DUDE with 2-D DUDE, scheme (i) in the previous subsection, as there were 



no significant differences between other baselines in Section 4.1, and it is more natural to compare our scheme with 
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(d) 2D S-DUDE (jfe = 4, m = 4) 



Figure 8: Scanned magazine image and denoising results 



one that also uses the 2-D contexts. For our experiments, we varied the context size k from 1 to 9 for both of the 
schemes and tried several values of m for 2-D S-DUDE. The BER plot in Figure [9] again shows that our 2-D S-DUDE 
consistently outperforms 2-D DUDE and the best BER is reduced by about 6%. This improvement is significant 
since 2-D DUDE was already shown to outperform many of the state-of-the-art binary image denoising algorithms. 
Moreover, we see that the 2-D S-DUDE achieves its optimum performance using context length k which is half of that 
used by the 2-D DUDE, resulting in overall lower complexity. That is, although 2-D S-DUDE introduces another 
parameter m, the complexity is linear in m, and it reduces the dependency on k which contributes exponentially 
to the complexity. Figure 8(c) and Figure [8(d) | respectively show the denoising results. We observe that the 2-D 



S-DUDE not only has a smaller number of errors, but also does a better job than 2-D DUDE in preserving the 
sub-image textures. 



4.3 Lena image 

We now show the results for the binary Lena image of size 512 x 512. The noisy channel is identical to the previous 
subsections. Figure |10(a)| and Figure |10(b)| show the clean and noisy Lena images, and Figure |10(c) shows the 
denoising results for both 2-D DUDE with k = 6 and 2-D S-DUDE with k = 6 and m = 4. Figure |TQ(d)| shows the 
BER plot for both 2-D DUDE and 2-D S-DUDE. 

These results show what might be expected; when the image characteristics are largely homogeneous, the best 
denoising performance of 2-D DUDE and 2-D S-DUDE are similar. However, as we can see in the BER plot in 
Figure |10(d)[ 2-D S-DUDE reduces the BER faster than 2-D DUDE even with smaller context size k by introducing 
another parameter m. This can be beneficial when we do not know which k would be the optimal for denoising in a 
practical scenario. 
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Figure 9: Bit error rate comparison for the scanned magazine image 

5 Concluding Remarks 

We have generalized the S-DUDE proposed in [2 to two-dimensional data. Due to the hardness of optimally 
segmenting the 2-D data, we introduced a QT decomposition-based reference class of shifting 2-D sliding window 
denoisers, then utilized the PH scanning technique to efficiently implement the scheme that can attain the optimum 
performance in the reference class without knowing anything a priori about the characteristics of the underlying 
clean data. Experimental results show that our scheme can be effective in further reducing the loss of 2-D DUDE, 
especially for heterogenous images consisting of sub-images of varying natures. Among other related lines of inquiry, 
future work will investigate the effectiveness of combining more general data segmentation and scanning techniques. 

Acknowledgement 

The authors are grateful to Erik Ordentlich for helpful discussions. 

References 

[1] T. Weissman, E. Ordentlich, G. Seroussi, S. Verdii, and M. Weinberger, "Universal discrete denoising: Known 
channel," IEEE Trans. Inform. Theory, vol. 51, no. 1, pp. 5-28, 2005. 

[2] T. Moon and T. Weissman, "Discrete denoising with shifts," IEEE Trans. Inform. Theory, vol. 55, no. 11, 
pp. 5284-5301, 2009. 

[3] E. Ordentlich, G. Seroussi, S. Verdii, M. Weinberger, and T. Weissman, "A discrete universal denoiser and its 
application to binary images," Proc. IEEE Int. Conf. on Image Processing, p. 117-120, vol. 1, Sept. 2003. 

[4] A. Lempel and J. Ziv, "Compression of two-dimensional data," IEEE Trans. Inform. Theory, vol. 32, no. 1, 
pp. 2-8, 1986. 

[5] H. Sagan, Space- Filling Curves. Springer- Ver lag, 1994. 

[6] E. Shusterman and M. Feder, "Image compression via improved quadtree decomposition algorithms," IEEE 
Trans. Image Processing, vol. 3, no. 2, pp. 207-215, 1994. 

[7] G. Sullivan and R. Baker, "Efficient quadtree coding of images and video," IEEE Trans. Image Processing, 
vol. 3, no. 3, pp. 327-331, 1994. 



15 




(a) Clean image 



(b) Noisy image 



(c) 2D DUDE(fc = 6) and 2D S- 
DUDE (k = 6,ra = 4) 



0.11 



0.1 - 



0.09- 



BER for Lena Image 



LU 

CQ 



0.08- 



0.07 



0.06 



0.05 




3 4 5 

Context length k 

(d) BER plot 
Figure 10: Lena image and denoising results 
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